AITopics | basis model

Country:

Asia > Singapore (0.40)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsFeb-7-2026, 11:25:04 GMT

Identification of Nonlinear Latent Hierarchical Models Lingjing Kong

Classical causal structure learning algorithms often assume no latent confounders. However, it is usually impossible to enumerate and measure all task-related variables in real-world scenarios. Neglecting latent confounders may lead to spurious correlations among observed variables.

artificial intelligence, latent variable, machine learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Neural Information Processing SystemsOct-10-2025, 15:47:29 GMT

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising Gongfan Fang Xinyin Ma Xinchao Wang National University of Singapore

The goal of multi-expert denoising is to increase the overall capacity of diffusion models while keeping an acceptable overhead.

basis model, coefficient, diffusion model, (15 more...)

Country:

Asia > Singapore (0.40)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-8-2025, 01:17:18 GMT

065ef23a944b3995de7dd4a3e203d133-Paper-Conference.pdf

artificial intelligence, latent variable, machine learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Fang, Gongfan, Ma, Xinyin, Wang, Xinchao

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising

arXiv.org Artificial IntelligenceDec-7-2024

Transformer-based diffusion models have achieved significant advancements across a variety of generative tasks. However, producing high-quality outputs typically necessitates large transformer models, which result in substantial training and inference overhead. In this work, we investigate an alternative approach involving multiple experts for denoising, and introduce Remix-DiT, a novel method designed to enhance output quality at a low cost. The goal of Remix-DiT is to craft N diffusion experts for different denoising timesteps, yet without the need for expensive training of N independent models. To achieve this, Remix-DiT employs K basis models (where K < N) and utilizes learnable mixing coefficients to adaptively craft expert models. This design offers two significant advantages: first, although the total model size is increased, the model produced by the mixing operation shares the same architecture as a plain model, making the overall model as efficient as a standard diffusion transformer. Second, the learnable mixing adaptively allocates model capacity across timesteps, thereby effectively improving generation quality. Experiments conducted on the ImageNet dataset demonstrate that Remix-DiT achieves promising results compared to standard diffusion transformers and other multiple-expert methods. The code is available at https://github.com/VainF/Remix-DiT.

artificial intelligence, machine learning, natural language, (19 more...)

2412.05628

Country:

Asia > Singapore (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Brusatin, Simone, Padoan, Tommaso, Coletta, Andrea, Gatti, Domenico Delli, Glielmo, Aldo

Simulating the economic impact of rationality through reinforcement learning and agent-based modelling

arXiv.org Artificial IntelligenceMay-3-2024

Agent-based models (ABMs) are simulation models used in economics to overcome some of the limitations of traditional frameworks based on general equilibrium assumptions. However, agents within an ABM follow predetermined, not fully rational, behavioural rules which can be cumbersome to design and difficult to justify. Here we leverage multi-agent reinforcement learning (RL) to expand the capabilities of ABMs with the introduction of fully rational agents that learn their policy by interacting with the environment and maximising a reward function. Specifically, we propose a 'Rational macro ABM' (R-MABM) framework by extending a paradigmatic macro ABM from the economic literature. We show that gradually substituting ABM firms in the model with RL agents, trained to maximise profits, allows for a thorough study of the impact of rationality on the economy. We find that RL agents spontaneously learn three distinct strategies for maximising profits, with the optimal strategy depending on the level of market competition and rationality. We also find that RL agents with independent policies, and without the ability to communicate with each other, spontaneously learn to segregate into different strategic groups, thus increasing market power and overall profits. Finally, we find that a higher degree of rationality in the economy always improves the macroeconomic environment as measured by total output, depending on the specific rational policy, this can come at the cost of higher instability. Our R-MABM framework is general, it allows for stable multi-agent learning, and represents a principled and robust direction to extend existing economic simulators.

agent, competition, rl agent, (16 more...)

2405.02161

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningOct-31-2023

Identification of Nonlinear Latent Hierarchical Models

Kong, Lingjing, Huang, Biwei, Xie, Feng, Xing, Eric, Chi, Yuejie, Zhang, Kun

Identifying latent variables and causal structures from observational data is essential to many real-world applications involving biological data, medical data, and unstructured data such as images and languages. However, this task can be highly challenging, especially when observed variables are generated by causally related latent variables and the relationships are nonlinear. In this work, we investigate the identification problem for nonlinear latent hierarchical causal models in which observed variables are generated by a set of causally related latent variables, and some latent variables may not have observed children. We show that the identifiability of causal structures and latent variables (up to invertible transformations) can be achieved under mild assumptions: on causal structures, we allow for multiple paths between any pair of variables in the graph, which relaxes latent tree assumptions in prior work; on structural functions, we permit general nonlinearity and multi-dimensional continuous variables, alleviating existing work's parametric assumptions. Specifically, we first develop an identification criterion in the form of novel identifiability guarantees for an elementary latent variable model. Leveraging this criterion, we show that both causal structures and latent variables of the hierarchical model can be identified asymptotically by explicitly constructing an estimation procedure. To the best of our knowledge, our work is the first to establish identifiability guarantees for both causal structures and latent variables in nonlinear latent hierarchical models.

artificial intelligence, latent variable, machine learning, (16 more...)

arXiv.org Machine Learning

2306.07916

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Nooralinejad, Parsa, Abbasi, Ali, Koohpayegani, Soroush Abbasi, Meibodi, Kossar Pourahmadi, Khan, Rana Muhammad Shahroz, Kolouri, Soheil, Pirsiavash, Hamed

PRANC: Pseudo RAndom Networks for Compacting deep models

arXiv.org Artificial IntelligenceAug-28-2023

We demonstrate that a deep model can be reparametrized as a linear combination of several randomly initialized and frozen deep models in the weight space. During training, we seek local minima that reside within the subspace spanned by these random models (i.e., `basis' networks). Our framework, PRANC, enables significant compaction of a deep model. The model can be reconstructed using a single scalar `seed,' employed to generate the pseudo-random `basis' networks, together with the learned linear mixture coefficients. In practical applications, PRANC addresses the challenge of efficiently storing and communicating deep models, a common bottleneck in several scenarios, including multi-agent learning, continual learners, federated systems, and edge devices, among others. In this study, we employ PRANC to condense image classification models and compress images by compacting their associated implicit neural networks. PRANC outperforms baselines with a large margin on image classification when compressing a deep model almost $100$ times. Moreover, we show that PRANC enables memory-efficient inference by generating layer-wise weights on the fly. The source code of PRANC is here: \url{https://github.com/UCDvision/PRANC}

artificial intelligence, deep learning, machine learning, (19 more...)

2206.08464

Country: North America > United States > California > Yolo County > Davis (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Military (0.68)
Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Xu, Mengda, Ganesh, Sumitra, Pasula, Pranay

Mixture of basis for interpretable continual learning with distribution shifts

arXiv.org Artificial IntelligenceJan-5-2022

Continual learning in environments with shifting data distributions is a challenging problem with several real-world applications. In this paper we consider settings in which the data distribution(task) shifts abruptly and the timing of these shifts are not known. Furthermore, we consider a semi-supervised task-agnostic setting in which the learning algorithm has access to both task-segmented and unsegmented data for offline training. We propose a novel approach called mixture of Basismodels (MoB) for addressing this problem setting. The core idea is to learn a small set of basis models and to construct a dynamic, task-dependent mixture of the models to predict for the current task. We also propose a new methodology to detect observations that are out-of-distribution with respect to the existing basis models and to instantiate new models as needed. We test our approach in multiple domains and show that it attains better prediction error than existing methods in most cases while using fewer models than other multiple model approaches. Moreover, we analyze the latent task representations learned by MoB and show that similar tasks tend to cluster in the latent space and that the latent representation shifts at the task boundaries when tasks are dissimilar.

arxiv template, basis model, task partition, (13 more...)

2201.01853

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.94)
Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)